-
Notifications
You must be signed in to change notification settings - Fork 682
Add mechanism to detect leaked references to mimirpb buffers #13609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
💻 Deploy preview available (Add mechanism to detect leaked references to mimirpb buffers): |
|
I expected the revert 69ca33a to make the regression test I added to fail, but it doest in CI? It fails locally though:
|
8abb9c9 to
9d6d2d0
Compare
|
There's the regression failure, after reverting the fix: I'll now reapply the fix. |
pracucci
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick look, not enough to do a proper review. I just left a couple of nits I noticed.
pkg/mimirpb/custom.go
Outdated
| // Unmarshal unmarshals an object using the global codec. Prefer this over calling | ||
| // the Unmarshal method directly, as it will take advantage of pools and leak | ||
| // detection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using it everywhere and add a faillint rule (defined in the Makefile) if the Unmarshal-function-with-receiver is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good idea long-term, but maybe for now it's better to restrict its usage to the places I know work fine, and then introduce it everywhere else progressively. What do you think?
pkg/mimir/mimir.go
Outdated
| func (c *CommonConfig) RegisterFlags(f *flag.FlagSet) { | ||
| c.Storage.RegisterFlagsWithPrefix("common.storage.", f) | ||
| c.ClientClusterValidation.RegisterFlagsWithPrefix("common.client-cluster-validation.", f) | ||
| f.Float64Var(&c.InstrumentRefLeaksPct, "common.instrument-reference-leaks-pct", 0, `Percentage of buffers from pools to instrument for reference leaks. 0 to disable.`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I suggest to change "pct" to "percentage". It will be consistent with other percentage-based settings we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tacole02
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs look good! I left a few minor suggestions. Thank you!
Co-authored-by: Taylor C <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Missing initialization for YAML unmarshaling field
The InstrumentRefLeaksPct field is defined in commonConfigUnmarshaler but not initialized when creating the unmarshaler in UnmarshalCommonYAML. This means setting instrument_ref_leaks_pct under the common: section in YAML config will fail. Either the field should be removed from commonConfigUnmarshaler if inheritance isn't needed, or it should be initialized here like the other fields.
pkg/mimir/mimir.go#L716-L721
Lines 716 to 721 in da56796
| common := configWithCustomCommonUnmarshaler{ | |
| Common: &commonConfigUnmarshaler{ | |
| Storage: &specificStorageLocations, | |
| ClientClusterValidation: &specificClusterValidationLocations, | |
| }, |
pkg/mimirpb/custom.go
Outdated
| buf := b.ReadOnlyData() | ||
| ptr := unsafe.SliceData(buf) | ||
| allPages := unsafe.Slice(ptr, roundUpToMultiple(len(buf), pageSize)) | ||
| err := syscall.Munmap(allPages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If buf is (say) (0x70000000, 0x74000000] and immediately after this Munmap call returns, another instrumented pool buffer is mmapped that at least partially overlaps with that same range, then there can be times when dangling references to this buf are permitted to dereference data in the new buf without error. It may be more confidence-inspiring to claim "our new build served 500,000 requests out of the leak-instrumented pool with zero use-after-frees" if the guard mechanism is precise.
The idea I had was to have a long-lived MMapped buffer that has a cool-down time that is >> any reasonable request processing time. Which would have the same problem described above but only if there's a pathological request that runs for more than the cooldown time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same concern initially, but then noticed the docs for munmap say:
The munmap() system call deletes the mappings for the specified address range, and causes further references to addresses within the range to generate invalid memory references.
It seems to me the emphasized part guarantees that the address space range won't be reused, but maybe there's an implicit "... until the addresses are reused" in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more googling/LLMing and a short foray into OpenBSD's implementation point in the direction that you are right, and the address space can potentially be reused. Thanks for challenging this!
See b39485d
|
|
||
| func deserializeRecordContentV1(content []byte, wr *mimirpb.PreallocWriteRequest) error { | ||
| return wr.Unmarshal(content) | ||
| return mimirpb.Unmarshal(content, wr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Missing FreeBuffer() call causes memory leak in block builder
The change from wr.Unmarshal(content) to mimirpb.Unmarshal(content, wr) now sets a buffer reference on the PreallocWriteRequest, requiring callers to invoke FreeBuffer() when done. While Ingester.PushToStorageAndReleaseRequest correctly calls req.FreeBuffer(), TSDBBuilder.PushToStorageAndReleaseRequest in pkg/blockbuilder/tsdb.go only calls mimirpb.ReuseSlice(req.Timeseries) without freeing the buffer. When reference leak instrumentation is enabled via common.instrument-reference-leaks-percentage, this causes mmap'd buffers to never be unmapped, resulting in memory leaks in the block builder path.
Additional Locations (1)
| var unmapQueue chan unmapTask | ||
|
|
||
| var startFreeingInstrumentedBuffers = sync.OnceFunc(func() { | ||
| unmapQueue = make(chan unmapTask, 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sets a limit of 1000 WriteRequest buffers held up waiting. Is that too much maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I read this PR one concern I have is a queueing pileup one - we can dial in that X% of requests should get the instrumented pool, but in some cell it may work out that X% of requests may be a tipping point where requests take longer and longer and the number of mmapped regions grows until the process crashes.
To connect it to your question here, what if we have both an "X%" config knob but also a "maximum instrumented pools" knob so that we never exceed that limit. And then you don't need to limit the unmaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, added a knob at f44250c
I have kept behavior of immediately unmapping if unmapQueue is full because it seems to me it's saner than the alternative of blocking and waiting.
pkg/mimirpb/custom.go
Outdated
| allPages := unsafe.Slice(ptr, roundUpToMultiple(len(buf), pageSize)) | ||
| if b.waitBeforeReuse > 0 { | ||
| select { | ||
| case unmapQueue <- unmapTask{buf: allPages, at: time.Now().Add(b.waitBeforeReuse)}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this now needs mprotect() so that the memory is rendered inaccessible between the free and unmap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed 🤦 I still need to actually verify this new machinery works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed at 325f3c6
pkg/mimirpb/custom.go
Outdated
| panic(fmt.Errorf("mprotect: %w", err)) | ||
| } | ||
| select { | ||
| case unmapQueue <- unmapTask{buf: allPages, at: time.Now().Add(b.waitBeforeReuse)}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Missing field causes nil pointer dereference in unmap
When sending an unmapTask to unmapQueue, the inflightInstrumentedBytes field is omitted from the struct literal, leaving it as nil. Later, when the background goroutine processes this task and calls unmap(), it will call inflightInstrumentedBytes.Sub() on a nil pointer, causing a panic. The struct literal needs to include inflightInstrumentedBytes: b.inflightInstrumentedBytes.
Additional Locations (1)
|
|
||
| require.NoError(t, err) | ||
| wr.ClearTimeseriesUnmarshalData() | ||
| wr.BufferHolder = mimirpb.BufferHolder{} // We don't want to compare this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Test missing defer FreeBuffer causes memory leak
The v1 test case is missing defer wr.FreeBuffer() that was added to the v0 test case. This causes the mmap'd buffer to never be freed in tests. Since tests run with 100% instrumentation, every buffer is mmap'd, and without FreeBuffer() being called, the mmap'd memory is leaked. Additionally, the v0 test's defer wr.FreeBuffer() is also ineffective because the BufferHolder is immediately zeroed on line 101, clearing the buffer reference before the defer can run.
|
Testing ff97b8c under some real load, it's apparent that we're not calling
|

What this PR does
Adds a
common.instrument-reference-leaks-pctflag. When set to >0, a percentage of gRPC BufferHolder objects (most prominently, WriteRequest) will be instrumented so that, once the buffer's reference count reaches zero, if there are remaining references to the buffer, a panic is thrown.For this to work, either the object needs to come from a generated gRPC server using our custom global codec, or the newly introduced
mimirpb.Unmarshalmust be used.Since ingester storage doesn't receive WriteRequests from gRPC anymore but from Kafka,
storage/ingestnow callsmimirpb.Unmarshal.To-do: plumb this mechanism with the distributor too, which has its own unmarshaling path.
Which issue(s) this PR fixes or relates to
Follow up of #13573
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]. If changelog entry is not needed, please add thechangelog-not-neededlabel to the PR.about-versioning.mdupdated with experimental features.Note
Introduces an experimental mechanism and flags to instrument a fraction of BufferHolder-backed gRPC buffers and detect leaked references, wiring it into the global codec and unmarshalling paths.
mimirpb.CustomCodecConfigand global codec registration; optional leak instrumentation using dedicated pages and delayed unmapping to detect stale references.mimirpb.Unmarshal(...)helper to use the global codec; replace direct unmarshalling instorage/ingestpaths.pkg/blockbuilder/tsdb: ensure request buffers are freed viadefer req.FreeBuffer().common.instrument_ref_leakswith flags:-common.instrument-reference-leaks.percentage,-common.instrument-reference-leaks.before-reuse-period,-common.instrument-reference-leaks.max-inflight-instrumented-bytes.CommonConfig, inheritance, help text, descriptors, defaults, and process setup.Written by Cursor Bugbot for commit ff97b8c. This will update automatically on new commits. Configure here.